The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The problem of two-dimensional pattern matching invariant under a given class of admissible transformations is to find in text T matches of transformed versions f(P) of the pattern P, for all f in . In this paper, pattern matching invariant under compositions of real scaling and rotation are investigated. We give a new discretization technique for this class of transformations...
We study the problem of finding, in a given word, all maximal gapped palindromes verifying two types of constraints, that we call long-armed and length-constrained palindromes. For both classes, we propose algorithms that run in time O(n + S), where S is the number of output palindromes. Both algorithms can be extended to compute biological gapped palindromes within the same time bound.
We study the NP-complete Graph Motif problem: given a vertex-colored graph G = (V,E) and a multiset M of colors, does there exist an S ⊆ V such that G[S] is connected and carries exactly (also with respect to multiplicity) the colors in M? We present an improved randomized algorithm for Graph Motif with running time O(4.32|M|·|M|2·|E|). We extend our algorithm to list-colored graph vertices and the...
How to evaluate the quality of models is a basic problem for the field of protein structure prediction. Numerous evaluation criteria have been proposed, and one of the most intuitive criteria requires us to find a largest well-predicted subset — a maximum subset of the model which matches the native structure [12]. The problem is solvable in O(n7) time, albeit too slow for practical usage...
The genomic distance problem in the Hannenhalli-Pevzner theory is the following: Given two genomes whose chromosomes are linear, calculate the minimum number of inversions and translocations that transform one genome into the other. This paper presents a new distance formula based on a simple tree structure that captures all the delicate features of this problem in a unifying way.
We present an algorithm for computing the edit distance of two RNA structures with arbitrary kinds of pseudoknots. A main benefit of the algorithm is that, despite the problem is NP-hard, the algorithmic complexity adapts to the complexity of the RNA structures. Due to fixed parameter tractability, we can guarantee polynomial run-time for a parameter which is small in practice. Our algorithm can be...
A string barcoding problem is defined as to find a minimum set of substrings that distinguish between all strings in a given set of strings . In a biological sense the given strings represent a set of genomic sequences and the substrings serve as probes in a hybridisation experiment. In this paper, we study a variant of the string barcoding problem in which the substrings have to be chosen...
We present probabilistic arithmetic automata (PAAs), which can be used to model chains of operations whose operands depend on chance. We provide two different algorithms to exactly calculate the distribution of the results obtained by such probabilistic calculations. Although we introduce PAAs and the corresponding algorithm in a generic manner, our main concern is their application to pattern matching...
We analyze the lossless data compression scheme using antidictionary. Its principle is to build the dictionary of a set of words that do not occur in the text (minimal forbidden words). We prove here that the number of words in the antidictionary, i.e., minimum forbidden words, behaves asymptotically linearly in the length of the text under a memoryless model on the generation of texts. The linearity...
A string S ∈ Σm can be viewed as a set of pairs S = { (σi , i) : i ∈ { 0,..., m − 1} }. We consider approximate pattern matching problems arising from the setting where errors are introduced to the location component (i), rather than the more traditional setting, where errors are introduced to the content itself (σ...
We introduce a new dimension to the widely studied on-line approximate string matching problem, by introducing an error threshold parameter ε so that the algorithm is allowed to miss occurrences with probability ε. This is particularly appropriate for this problem, as approximate searching is used to model many cases where exact answers are not mandatory. We show that the relaxed version of the problem...
We present a deterministic black box solution for online approximate matching. Given a pattern of length m and a streaming text of length n that arrives one character at a time, the task is to report the distance between the pattern and a sliding window of the text as soon as the new character arrives. Our solution requires $O(\Sigma_{j=1}^{\log_2{m}} T(n,2^{j-1})/n)$ time for each input character,...
Suffix trees are among the most important data structures in stringology, with myriads of applications. Their main problem is space usage, which has triggered much research striving for compressed representations that are still functional. We present a novel compressed suffix tree. Compared to the existing ones, ours is the first achieving at the same time sublogarithmic complexity for the operations,...
Let G be an unweighted and undirected graph of n nodes, and let D be the n ×n matrix storing the All-Pairs-Shortest-Path distances in G. Since D contains integers in [n] ∪ + ∞, its plain storage takes n2log(n + 1) bits. However, a simple counting argument shows that (n2 − n)/2 bits are necessary to store D. In this paper we investigate the question of finding a succinct representation...
The Sort Transform (ST) can significantly speed up the block sorting phase of the Burrows-Wheeler transform (BWT) by sorting only limited order contexts. However, the best result obtained so far for the inverse ST has a time complexity O(Nlogk) and a space complexity O(N), where N and k are the text size and the context order of the transform, respectively. In this paper, we present a novel algorithm...
Suffix trees are by far the most important data structure in stringology, with myriads of applications in fields like bioinformatics, data compression and information retrieval. Classical representations of suffix trees require O(n logn) bits of space, for a string of size n. This is considerably more than the n log2σ bits needed for the string itself, where σ is the alphabet size. The...
Concept lattices (also called Galois lattices) have been applied in numerous areas, and several algorithms have been proposed to construct them. Generally, the input for lattice construction algorithms is a binary matrix with size |G||M| representing binary relation I ⊆ G ×M . In this paper, we consider polynomial delay algorithms for building concept lattices. Although the concept lattice may be...
An interval [p,q], where 0 ≤ p ≤ q < 2n, can be considered as the set X of n-bit binary strings corresponding to encodings of all integers in [p,q]. A word w with don’t care symbols is matching the set L(w) of all words of the length |w| which can differ only on positions containing a don’t care. A set Y of words with don’t cares is matchingX iff X = ∪ ...
The LCS of two rooted, ordered, and labeled trees F and G is the largest forest that can be obtained from both trees by deleting nodes. We present algorithms for computing tree LCS which exploit the sparsity inherent to the tree LCS problem. Assuming G is smaller than F, our first algorithm runs in time $O(r\cdot {\rm height}(F) \cdot {\rm height}(G)\cdot \lg\lg |G|)$ , where r is the number of...
The shortest common superstring problem (SCS) has been widely studied for its applications in string compression and DNA sequence assembly. Although it is known to be Max-SNP hard, the simple greedy algorithm works extremely well in practice. Previous researchers have proved that the greedy algorithm is asymptotically optimal on random instances. Unfortunately, the practical instances in DNA sequence...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.